Continuous Audio Object Recognition Diploma Thesis

نویسندگان

  • Florian Kraft
  • Alex Waibel
  • Kristian Kroschel
  • Thomas Schaaf
  • Rob Malkin
چکیده

The detection of sound events is a key technology for a various set of audio applications. Sounds are able to transport information through vision borders. Therefore, a humanoid robot assigned with kitchen tasks improves its interactive behavior with the environment a lot when using acoustics. While audio scene analysis employs a lot of subjects, this thesis deals with the recognition of presegmented as well as continuous audio objects using single channel microphone input. Further prior knowledge on scenes with single and multiple sources was not used. This means that recognition is performed without information on the audio context like source positions and statistical information on typical event sequences. The three explored feature sets consisted of MFCCs with first and second order temporal derivatives, PCA-ICA features without using temporal context and PCA-ICA features on several context window sizes. In a first batch of experiments those features were evaluated for GMMs, forward and ergodic HMMs on predefined segments for single source data, which was recorded in different kitchens. The results show that for single source data MFCC features perform worse than ICA features, independent of the classifier. Further, ICA features covering temporal context gave even better results. The comparison of forward and ergodic models for different number of states revealed that the kitchen task class set generally favors ergodic HMMs instead of left-right models. Another experiment confirmed the superiority of ICA to MFCCs with respect to the number of gaussian parameters. While the ICA features for an architecture, which cover shared global interclass properties, appeared to be superior on single source data, this benefit could not be shown under continuous real world cooking conditions with background noise. Scarce class occurrences in realworld conditioned data in combination with low recognition performance showed that source separation, confidence measures and multi track hypothesis output need to be considered in future research directions. Furthermore, the mapping of acoustic entities to semantics during labeling and training has to be performed carefully.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Object Recognition with local feature trajectories

This diploma thesis presents a novel approach for extracting a 3-D object description. In particular, representations based on local features generated from image sequences are studied. In a first step, different methods for detecting and tracking features are analysed. As a result trajectories of local features are obtained which are examined with respect to their quality and robustness. In a ...

متن کامل

Appearance-Based Features for Automatic Continuous Sign Language Recognition

This diploma thesis investigates appearance-based features for the person-independent vision-based recognition of continuous sign language. A large variety of methods which have been successfully used for automatic speech recognition is applied to this task. Appearance-based approaches do not rely on a segmentation of the images or on predefined models of the image content and use the image its...

متن کامل

An Integrated Tracking And Recognition Approach For Video

This diploma thesis investigates integrated tracking and recognition in continuous sign language recogntion as well as efficient approximations to it. Current state-of-the art sign language recognition system use tracking only as a preprocessing step. Hence, tracking errors lead to recognition errors. We propose to integrate scoring functions of a model-free dynamic programming tracking framewo...

متن کامل

Patch-based Object Recognition

Acknowledgements I would like to thank Prof. Ney for offering me the opportunity to work as a student researcher at the Chair of Computer Science 6, where I have been a member of the image recognition group since August 2004, and to write my diploma thesis at this department. Also, I would like to thank Prof. Seidl who kindly accepted to co-supervise this work. For the excellent supervision of ...

متن کامل

Appearance-Based Gesture Recognition

This diploma thesis investigates the use of appearance-based features for the recognition of gestures using video input. Previously, work in the field of gesture recognition usually first segmented parts of the input images — for example the hand — and then used features calculated from this segmented input. Results in the field of object recognition in images suggest that this intermediate seg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005